Population genomic analyses reveal that salinity and geographic isolation drive diversification in a free-living protist

Protists make up the vast diversity of eukaryotic life and play a critical role in biogeochemical cycling and in food webs. Because of their small size, cryptic life cycles, and large population sizes, our understanding of speciation in these organisms is very limited. We performed population genomic analyses on 153 strains isolated from eight populations of the recently radiated dinoflagellate genus Apocalathium, to explore the drivers and mechanisms of speciation processes. Species of this genus inhabit both freshwater and saline habitats, lakes and seas, and are found in cold temperate environments across the world. RAD sequencing analyses revealed that the populations were overall highly differentiated, but morphological similarity was not congruent with genetic similarity. While geographic isolation was to some extent coupled to genetic distance, this pattern was not consistent. Instead, we found evidence that the environment, specifically salinity, is a major factor in driving ecological speciation in Apocalathium. While saline populations were unique in loci coupled to genes involved in osmoregulation, freshwater populations appear to lack these. Our study highlights that adaptation to freshwater through loss of osmoregulatory genes may be an important speciation mechanism in free-living aquatic protists.

Our understanding of evolution and mechanisms of speciation is largely based on studies of macroscopic and multicellular organisms.However, the vast diversity of eukaryotes is found within the unicellular microeukaryotes, i.e. protists 1 .Nevertheless, there is limited knowledge regarding divergence and ultimately speciation in protists.In contrast to most multicellular eukaryotes, protists usually have extremely large population sizes, short generations, and reproduction is dominated by asexual reproduction.Consequently, effects of genetic drift, bottlenecks, adaptation, as well as migration rates are expected to differ among these major life forms.Importantly, studying speciation in protists will provide clues to eukaryotic evolution and the evolution of multicellularity 2 .
Settling on a species concept and identifying it is challenging in protists.The most common and practical species concept is the morphospecies, which is based on microscopic morphological differences.The small size and limited morphological variation, makes it difficult to distinguish closely related taxa.The biological species concept 3 is problematic in protists because some species are strictly asexual, and because sexual reproduction, and thus reproductive isolation, are generally very difficult to detect.Sexual events are often challenging to induce in the laboratory and to identify and quantify in the wild.Thus, determining if populations can interbreed and produce fertile offspring 4 , is usually impossible.However, there is some evidence, at least among green algae, diatoms, and dinoflagellates, that a certain genetic difference is correlated with reproductive isolation [5][6][7][8] .
The ecological species concept 9 is useful for microorganisms including protists.This concept defines a species as a lineage which occupies an adaptive zone different from any other lineage in its range, and which evolves separately from lineages outside its range, ultimately leading to speciation.Shapiro et al 10 concur that speciation (in microorganisms) is largely driven by natural selection, followed by genome divergence due to reduced gene flow in recombining species-or mutations in clonal lineages.Neutral speciation could potentially occur in cases of drift in conjunction with geographic isolation, but has been regarded unlikely for microbes, because their www.nature.com/scientificreports/large population sizes presumably preclude genetic drift.In addition, geographic isolation is considered unlikely due to their putative high dispersal capacity.However, physical barriers have been shown to be important in the speciation of the marine phytoplankton Gephyrocapsa 11 .Rengefors et al. 12 argue that bottlenecks may actually occur during protist population minima and when species invade a new habitat.Moreover, population genetic studies in protists indicate that gene flow among populations is often quite low (see review 12 ), which may be enough to promote speciation 13 .
In this study, we examined the underlying mechanisms that have led to recent speciation in a protist species flock, utilizing a population genomic approach combined with transcriptome data.This species flock consists of closely related lineages of the genus Apocalathium, a planktonic, phototrophic dinoflagellate 14 .They occupy similar ecological niches (cold-water, mostly under ice), but which differ in salinity ranging from freshwater to fully marine systems, imposing huge differences in osmotic stress.The genus occurs in geographically widely separated habitats including the two polar zones, lakes as well as the ocean 15 (Fig. 1).Apocalathium consists of four different morphospecies; a rounded type (A.malmogiense) and a flattened small-spined type (A.aciculiferum), a large spined morphotype (A.baicalense) and a large flattened morphotype (A.euryceps).Significant changes in their morphology during the culturing or intermediate forms have not been found 15 .Interestingly, A. malmogiense, A. aciculiferum A. baicalense, and A. euryceps are found sympatrically in ancient Lake Baikal.The different morphospecies have identical 18S rRNA gene sequences but with small differences in LSU and ITS rRNA sequences.Phylogenetic analyses cannot delimit these four morphospecies and gene trees are inconsistent, likely reflecting a recent and rapid adaptive radiation in Apocalathium 15 .The secondary structure of the ITS-2 rRNA region, shows that Antarctic and all the other lineages form two separate clusters, suggesting that the Antarctic lineage is reproductively isolated 8 .In contrast, a phylotranscriptomic 792-gene analysis using three strains, showed that the Baltic Sea strains were more closely related to the Antarctic strains rather than the neighboring Swedish freshwater strains 16 .The latter indicates that environment (salinity) could be an important driver in the speciation of Apocalathium.
The specific aims of this work were thus to (1) determine genetic differentiation among populations of Apocalathium in relation to morphospecies and origin, (2) determine whether geographic isolation or salinity was the most important segregating mechanism, and (3) explore differences in expressed genes between freshwater and saline lineages.Our approach was to perform a population genomic study by generating high-throughput sequencing data in a large number of strains from multiple sites where Apocalathium occurs.The resulting data were used to determine population genetic structure, gene flow, and to identify loci that differ between freshwater and saline populations.

Results
We sampled Apocalathium at eight different locations representing two different morphospecies, different habitats (freshwater and saline), and geographic locations (Scandinavia, Baltic Sea, Siberia, and Antarctica) (see "Methods", Fig. 1, Supplementary Table 1).Two Scandinavian freshwater lake populations belonged to the morphospecies A. aciculiferum.All other populations belonged to the morphospecies A. malmogiense.Multiple single-cell Sampling locations of A. malmogiense were in the Baltic Sea, A. cf malmogiense in Siberia (Lake Baikal), A. aciculiferum in freshwater lakes in Scandinavia, and A. cf malmogiense in Antarctic lakes of the Vestfold Hills.Localities: Hig = Highway Lake, ver = Lake Vereteno, mcn = Lake McNeil, gof = Gulf of Finland, tvä = Tvärminne, cop = Sankt Jorgens Sjø in Copenhagen, erk = Lake Erken, bai = Lake Baikal.The figure was made using the world map image accessible under the Creative Commons license (https:// worda ssoci ations.net/ en/ pictu res? id= neocr eo-Blue_ World_ Map), and the Google Earth website for the insets (https:// earth.google.com/ accessed on 14 September 2023).«Inkscape» (https:// inksc ape.org/) free graphics editor was used to edit the images.
www.nature.com/scientificreports/isolates were cultivated from each location for population genetic analyses.Due to a large genome (~ 30 Gbp) and no reference genome, standard Restriction-site Associated DNA (RAD) sequencing 17 was applied to obtain Single Nucleotide Polymorphism (SNP) markers.In total, 153 strains, from 8 different locations, were sequenced, and a final 345 shared SNP loci were used for population genetic analyses (see "Supplementary information").

Variable within-population genetic diversity
To gain insight into the evolutionary histories of the different populations we calculated various within-population metrics.The highest number of total and variant shared RAD loci were found in the Antarctic and Baltic populations, while the Lake Baikal population had the highest number of private loci, suggesting a more independent evolutionary track (Table 1).Lake Baikal is in fact by far the oldest lake with its 25 million years, and the dinoflagellate populations there are hypothesized to have colonized 5 million to 12,000 years ago 18 .In contrast, the other two freshwater lakes must have been colonized after the last glaciation, so less than 20,000 years ago 19 , while Lake Baikal was not frozen during the last ice age 20 .The lowest genetic variation (nucleotide diversity and population heterozygosity) was found in the Antarctic populations, possibly reflecting a more recent colonization, followed by genetic drift and isolation.The Antarctic lakes are estimated to have been isolated from the sea only ~ 6000 years ago 21 and are ice-covered most of the time, likely minimizing new immigrations 22 .Nucleotide diversity was highest in the Lake Erken population, as was population heterozygosity (Table 1), despite that the lake is estimated to have formed 3000 years ago, separating from the Baltic Sea 23 following land-rise.

High genetic differentiation among populations
Pairwise comparisons of dinoflagellate populations from all eight locations showed high and significant differences between all geographic regions (Phi ST values between 0.82 and 0.97; Table 2).In contrast, within the Baltic Sea, the two populations Tvärminne and Gulf of Finland, which are hydrologically connected, were not significantly differentiated.Similarly, the Antarctic populations in lakes Highway and McNeil, which are less than 10 km apart, were not significantly different.However, genetic distance was not always correlated with geographic distance, and while the Mantel test of Isolation-By-Distance (IBD) showed significant (p = 0.001) genetic isolation with geographic distance, geographic distance only explained 38.9% of the variation (Fig. 2A).For instance, the Antarctic Lake Vereteno population was significantly different from the other two Antarctic Table 1.Summary statistics from Stacks-population runs of all populations.Data reported includes total number of shared RAD sites (variant and fixed), the number of variant loci, the number of private loci, nucleotide diversity (π), and population-level heterozygosity from Genodive.Populations were labeled by the regions and sampling locations.The regions include Scandinavia (SCA), Siberia (SIB), Baltic Sea (BAL), Antarctica (ANT).

Unique loci when contrasting freshwater and saline populations
To explore differences between freshwater populations and those adapted to brackish saline water habitats, we identified unique RAD-loci to each of those groups ("Supplementary information", Methods).The RAD sequences representing these loci were then mapped against a merged transcriptome database consisting of 506,560 reads from strains originating from freshwater Lake Erken (cultivated at salinities 0 and 3) and saline Baltic Sea and Highway lake (cultivated at salinities 0, 3 and 30).From the unique loci that had a transcriptome hit and a hit against SwissProt, the top loci differentiating freshwater vs saltwater populations were analyzed further in terms of gene ontology (GO).When comparing presence in salt-but not freshwater this yielded 90 different GO terms (Supplementary Table 2), while for presence in freshwater, but not saltwater, accounted for only 6 GO terms (Supplementary Table 2).Saltwater GO-loci were connected to chloride channels, iron transmembrane transport activity, and divalent cation transport, all involved in osmoregulation.Several loci were connected to betaine glycine and glyceraldehyde-3-phosphate dehydrogenase activity, i.e., connected to osmolytes (Table 3).Moreover, loci involved in urea transport as well as cell wall callose deposition were highly represented.GOloci present in freshwater only were mainly connected to sodium homeostasis and citrate transport (Table 3).

Discussion
Using population genomics, we found strong support for the hypothesis that the environment, specifically salinity, is a major factor in driving ecological speciation in the dinoflagellate Apocalathium.Geographic isolation also plays an important role, showing unambiguously that protists do not disperse at a higher rate than the rate of genetic differentiation, thereby allowing for allopatric speciation.In addition, we found that morphological similarity is not equivalent to genetic identity, demonstrating that the morphospecies concept is not suitable for all protists.A key finding in our study was that the genomic data did not support the current division of Apocalathium into two morphospecies, but rather into four lineages or species: (1) A. malmogiense from the Baltic Sea (2) A. cf malmogiense from the Antarctic lakes, (3) freshwater A. aciculiferum, and (4) A. cf malmogiense from Lake Baikal.Clearly the overall cell morphology is not a good species delineator in Apocalathium.The high pairwise www.nature.com/scientificreports/Phi ST values, which are close to 1, also support that these are four different species.Moreover, the fact that only 345 loci (out of 264,000 per individual) were shared under relatively relaxed parameters, further reflects on the large differences among these populations.Interestingly, A. aciculiferum from Scandinavia and A. cf.malmogiense from Lake Baikal clustered together despite having different morphologies.Possibly the round morphology confers some advantage in the sea and "sea-like" habitats like Lake Baikal, which with its huge depth and volume is more like a sea than a lake.Alternatively, cell morphology is a non-adaptive trait and the two lineages have different evolutionary trajectories.
A striking result was the distinct genetic structure and overall high level of genetic isolation among populations.Within the distinct lineages, there was either no significant differentiation (e.g.Baltic A. malmogiense populations which are hydrologically connected), or moderate differentiation (A.aciculiferum populations, Antarctic A. cf.malmogiense).Surprisingly, some pairwise lake population comparisons within the same lineage had a relatively high Phi ST (0.4-0.5) suggesting limited gene flow and rapid differentiation in lake populations.Previous studies using Amplified Fragment Length Polymorphism (AFLP) and microsatellites (reviewed in 12 ) have also shown that phytoplankton lake populations typically are significantly genetically differentiated.Even populations of phytoplankton in the Baltic Sea show fine-scale differentiation despite currents connecting the water masses [24][25][26] .While the mechanisms for this differentiation within species is not well understood, there is clearly limited gene flow among phytoplankton populations despite potentially high dispersal capacity.It has been suggested that monopolization of first colonizers 27,28 together with anchoring through resting cyst seed banks 29 contribute to limited gene flow.
Our results suggest that speciation has not been driven by geographic isolation primarily in Apocalathium.While genetic differentiation among populations was always high at all the large distances (> 1000 km), they varied considerably at smaller distances.Also, although the clustering analysis mostly separated geographically distant populations, the freshwater Baikal strains clustered with Scandinavian freshwater populations despite being geographically very distant.We interpret this as follows: while geography plays a certain role in protist speciation, local processes, both within lakes and among habitats are more important.This finding is in line with www.nature.com/scientificreports/other population genetic studies of limnic populations, demonstrating that isolation-by-distance falls apart at a certain distance 27 .
Here we provide evidence for ecological speciation, driven by habitat salinity, that has led to differentiation of freshwater and marine-brackishwater lineages.Salinity has previously been hypothesized as a major driver in the speciation of protists in general 30 including Apocalathium 31 .Recent work has demonstrated that freshwater-marine transitions are not as infrequent as proposed earlier, but that freshwater and saline species form phylogenetically distinct groups 32 .Our comparison of the freshwater versus the saline lineages indicate that the lineages have different sets of genes.Markedly, multiple unique loci in populations from saline habitats were found for genes related to osmoregulation.Several of these were involved in transport and catabolic processes of urea and glycine betaine, which are natural osmolytes that can serve as osmoprotectants 33 .By accumulating osmoprotectants the cell can balance the osmotic stress between the cell and the surroundings, and thereby maintain cell turgor and volume 33 .Glycine betaine has been found in various marine dinoflagellates associated with increased salinity [34][35][36] .Moreover, in a transcriptome analysis of the dinoflagellate Oxyrrhis marina, genes related to the glycine betaine pathway were upregulated when cells were grown in extremely high salinity (50 psu) 37 .In addition to genes related to glycine betaine, there were multiple hits related to chloride channel and sodium transporters.In mammalian cells, maintenance of cell volume is regulated by Na + /Cl − transport across the cell membrane, where shrinkage is counter-acted by accumulation of ions by Na + , K + , and 2Cl − transport.Thus, these transcripts are probably also utilized by Apocalathium to maintain cell volume.
While populations from saline water had unique RAD-loci with hits against osmoprotectant-related processes, no such hits were found for the freshwater populations.Instead, the most frequent unique loci were found in genes coupled to regulation of calcium and sodium homeostasis.In freshwater environments, cells experience a hypotonic exterior environment, and water rushes into the cell causing swelling.Eukaryotic cells have either evolved aquaporins or contractile vacuoles to channel out water.However, contractile vacuoles are absent in dinoflagellates.Instead they have a pusule system which likely takes part in osmoregulation via a different mechanism 38 .Klut et al. 39 showed that the dinoflagellate pusule structure is a fibrillar collar system which together with the flagella may be connected to water expulsion.Surprisingly, further studies regarding osmoregulation of freshwater dinoflagellates are lacking.While speculative, the unique hits for freshwater lineages were connected to calcium and sodium homeostasis, as well as renal function-associated genes, suggesting that these could be linked to expulsion of freshwater.
Interestingly, transcripts connected to osmoregulation (glycine betaine, Na + /Cl − transport) were found in saline lineage transcriptomes (A.malmogiense, Antarctic A cf. malmogiense), but were lacking in the freshwater (A.aciculiferum) transcriptomes.This means that these genes are either not transcribed, found in very low copy number, or are not present at all, in the freshwater strains.To verify if the genes are still present in the genome of freshwater lineages in-depth genome sequencing is needed.However, since most dinoflagellates have constitutively expressed genes, a lack of transcription is less likely an explanation for the majority of the genes 40 .This is corroborated by the fact that in the laboratory A. aciculiferum grew at 0 and 3 psu but was unable to sustain growth at 30 psu 31 .In contrast, the genes related to putative freshwater expulsion, were found in the transcriptomes of all three lineages, even if the RAD-loci hits were unique for freshwater strains.Since saline lineages grew at 0 psu 16 , these lineages must have retained the ability to pump out water to maintain cell turgor.Since the RAD-loci were unique to freshwater strains, this suggests that there are genetic differences between the two groups in these genes.A plausible explanation is that there is variation in gene copy number, and that saline lineages have much fewer copies than freshwater lineages, resulting in few RAD-loci which are lost in the bioinformatic filtering.Dinoflagellates are known to have high gene copy numbers, often structured in tandem repeats, with within copy variation 41 .Another possible, but perhaps more unlikely explanation is that there are SNPs in the SbfI cut-site in the saline strains, thereby removing these RAD sites.
The lineages belonging to Apocalathium have previously been proposed to have undergone a recent adaptive radiation 15 .The current study supports this hypothesis since the differences between the freshwater and saline populations are not only found in neutral SNPs but also in functional genes.Given that there are multiple genes related to osmoregulatory capacity, this supports the hypothesis that the divergence is adaptive.We hypothesize that the ancestral Apocalathium species was a cold-water euryhaline marine species and that the freshwater species evolved when trapped in glacial lakes following recession of glaciers.However, the A. cf malmogiense in Lake Baikal may have evolved earlier since Lake Baikal was not frozen during the last glaciation 20 .Following adaptation to freshwater, the limnic lineages appear to have lost their ability to osmoregulate in water with salinity more than 3.This scenario may also explain the speciation of the closely related dinoflagellates Gymnodinium baicalense and G. corollarium, which inhabit the Baltic Sea and Lake Baikal respectively, and which differ in their ability to grow in saline water 42 .Thus, loss of osmoregulatory genes or switching off of their expression may be an important mechanism in speciation of protist that have transitioned between marine and freshwater environments.
Our study revealed that RAD-sequencing is both a feasible and successful strategy for population genetic/ genomic studies in dinoflagellates, and that in combination with transcriptomes can provide functional information on loci of interest.Given the size of the Apocalathium genomes the initial concern was that RAD-seq would be unfeasible.Using an 8-cutter restriction enzyme such as SbfI a total of 916,000 RAD sites were estimated, but around 264,000 were recovered on average per individual.A plausible explanation is that dinoflagellate genomes contain a large fraction of repetitive elements, being as high as 68% in the polar Polarella glacialis 40 .Despite this high number of RAD-sites we were able to sequence enough to have a high coverage per RAD-site.However, the loss of RAD-loci was high when filtering for shared loci (see "Supplementary Information").We interpret this loss to be due to the large genome size (unequal sequence depth) and high diversity.Nevertheless, sufficient SNPs were recovered to perform the study, making RAD-seq an excellent alternative to whole-genome sequencing which for these organisms is not feasible.To conclude, in this study we show that salinity is likely an important driver for population differentiation in the dinoflagellate Apocalathium, but also that geographic isolation plays an important role.The high genetic differentiation and the presumed loss of multiple genes involved in osmoregulation suggests that these lineages should be considered as separate species that no longer exchange genes.The implications of these results provide evidence of ecological speciation as an important process in the microbial world.

Sampling and isolation of dinoflagellate strains
Strains from the dinoflagellate genus Apocalathium were sampled from 8 different locations (lakes and sea) in four different geographic regions (Scandinavia, Baltic Sea, Siberia, Antarctica) (Supplementary Table 1).Apocalathium consists of four different morphospecies, where the rounded type (A.malmogiense) was described from a pond filled with Baltic Sea water 43 and is currently found in saline habitats (Baltic Sea, the Arctic Ocean, brackish Antarctic lakes) and in the ancient freshwater Siberian Lake Baikal 15,44 , and a flattened small-spined type (A.aciculiferum) is found in northern temperate lakes including bays of Lake Baikal.In addition, a third large spined morphotype (A.baicalense) is allegedly endemic to Lake Baikal, and a fourth large flattened morphotype (A.euryceps) has been encountered in Swedish freshwater lakes and Lake Baikal, but these are not included in the current study as they could not be cultivated.Strains from the Baltic Sea belonging to the morphospecies A. malmogiense were isolated from material collected at Tvärminne Zoological Station (TV), at the south-west coast of Finland in 2009 and 2010, and at the monitoring site LL7 in the Gulf of Finland in 2013.Single motile cells were isolated from net tow samples and cyst from surface sediment slurry incubations into separate wells of a 24-well tissue culture plate containing 1.5 mL enriched sea water (f/8-Si, salinity of 6.5 45 ), and incubated at 4 °C, 14:10 light:dark cycle and 100 μmol photons m −2 s −1 .
The Antarctic strains (morphospecies A. cf malmogiense) were isolated from brackish-saline lakes during an Antarctic expedition in 2009 as described in 22 .Freshwater strains of A. cf malmogiense from Lake Baikal in Siberia, Russia and A. aciculiferum originated from Lake Erken, Sweden, and a pond in central Copenhagen, Denmark, sampled in winter/early spring 2014 (Supplementary Table 1).Three strains from lake Erken were isolated in 2004.Cells from the freshwater lakes and Antarctic lakes were isolated from plankton samples collected with a 20 µm net.Individual cells were isolated manually, washed three times, and transferred to separate wells of a 48-well tissue culture plate.For the freshwater strains the wells contained 50% sterile-filtered lake water and the remainder artificial MWC medium with a selenium (Se) amendment (see 46 ).Cultures were grown at 4 °C in at 12:12 LD cycle at 50 µmol photons m -2 s -1 .When cultures had been established, they were further grown in MWC + Se only.The Antarctic strains were first isolated as described in 22 .The strains were subsequently transferred to f/2 medium with salinity 7-8, achieved by diluting sterile-filtered seawater with MQ water.

DNA extraction and RAD library preparations
All samples were harvested in 2015 by spinning down 30 ml culture in mid-late exponential phase for 10 min at 2000 g.DNA extractions were performed using the Qiagen DNeasy Plant Mini Kit (Qiagen) and DNA was quantified by Qubit.We followed a RAD library preparation protocol modified from 47 and 17 described in 48 .For each sample, 1 µg of genomic DNA was digested with 0.5 µl SbfI-HF (NEB, Ipswich, MA, USA).We used 0.5 µl of 2000 U/µl T4 ligase (NEB) in the P1 and P2 adapter (for sequences see "Supplementary Information") ligation steps and decreased the volume of NEB2 buffer (1 µl) used in the P1 adapter ligation.P1 adapters contained unique 7 bp barcodes to allow multiplexing strains in downstream library preparation, and 3 µl of barcoded P1 adapter (100 µM) were used in each ligation reaction.The final full amplification was performed with 67 ng of DNA template in a 100 µl reaction volume and 18 PCR cycles.The 300-700 bp size fraction of the PCR product was excised and purified from an agarose gel.20 uniquely barcoded strains were pooled per lane for sequencing in order to recover at least 8 million reads per sample, meaning at least 40 × coverage.Samples were sequenced with Illumina technology at the SNP&SEQ Technology Platform of the SciLifeLab facility in Uppsala, Sweden.Sequencing was performed using Illumina HiSeq2000 v4-chemistry, 125 bp.The R2 reads were not used in the downstream analyses.These sequences have been submitted to BioProject: PRJNA1025931.

RAD/SNP identification
All data was de-multiplexed, quality-checked, and processed using the Stacks software version 1.35 https:// catch enlab.life.illin ois.edu/ stacks/ 49,50 .The analysis pipeline was run manually.Following ustacks which builds loci, the number of retained sequences, RAD tags, and SNPs per sample were collected.Stacks software parameters were tested with a pilot data set using four strains with 4 M reads each.The parameters were chosen with the criteria to maintain a mean coverage of at least 30, and maximize number of utilized reads and polymorphic SNPs, by varying mismatch (M 0-1) and depth of stack (m 3 to 5) parameters.The final Stacks pipeline run was set with ustacks having the parameters -m 5 -M 0 -N 1 to build the RAD-locus catalog.The cstacks step was run with the number of mismatches (n) allowed between sample tags when generating the catalog, set to 2. For further details regarding choice of Stacks parameters see 51,52 .
Prior to proceeding with downstream analyses potential bacterial contaminant sequences were removed.This was done using the taxonomic sequence classifier Kraken2, version 2.0.8-beta 53to identify and subsequently blacklist those loci.This was done following steps 2 and 3 on https:// github.com/ Derri ckWood/ krake n2/ wiki/ Manual# custom-datab ases using the library "bacteria: RefSeq complete bacterial genomes/proteins".All diploid loci were also identified and blacklisted to only retain haploid loci (since most dinoflagellates are haploid).Before filtering, a total of 6,450,531 RAD-tag loci were identified in the data set with a mean of 219,299 per individual.Of these, 3.3% were classified as bacterial by Kraken.Only the first SNP in each RAD-tag was used for further analyses to avoid linked loci and hereafter referred to as RAD-loci.www.nature.com/scientificreports/converted using PGDSpider and subsequently used for the software BayeScan 2.1 http:// cmpg.unibe.ch/ softw are/ BayeS can/.Loci under strong selection pressure had their RAD-sequences blasted against a non-redundant database formed by merging the 8 transcriptomes (ID lines were concatenated).

Unique loci when contrasting freshwater and saline populations
To explore differences between freshwater populations and those adapted to brackish saline water habitats, we identified unique loci with annotated matches.We utilized the population division "salinity" (see above).Loci that were unique to one population (either freshwater or saline) were saved as fasta format and then blasted against the merged transcriptome.For all blasted loci, the top hit's top annotation's GO terms were tallied across loci to a per-population total.Loci whose top hits lacked annotation were ignored, as were loci without blast hits.A subset of the GO-terms were unique in one of the two populations.These were selected for in-depth gene ontology analysis, to determine potential function of involved transcripts.The threshold values for further evaluation were set to at least a tenfold difference between freshwater and saline.

Figure 1 .
Figure 1.Phylogenetic and geographic distribution of the dinoflagellate Apocalathium.The phylogenetic tree of the dinoflagellate genus Apocalathium is based on ITS2-LSU rRNA (from Annenkova et al. 2015).Sampling locations of A. malmogiense were in the Baltic Sea, A. cf malmogiense in Siberia (Lake Baikal), A. aciculiferum in freshwater lakes in Scandinavia, and A. cf malmogiense in Antarctic lakes of the Vestfold Hills.Localities: Hig = Highway Lake, ver = Lake Vereteno, mcn = Lake McNeil, gof = Gulf of Finland, tvä = Tvärminne, cop = Sankt Jorgens Sjø in Copenhagen, erk = Lake Erken, bai = Lake Baikal.The figure was made using the world map image accessible under the Creative Commons license (https:// worda ssoci ations.net/ en/ pictu res? id= neocr eo-Blue_ World_ Map), and the Google Earth website for the insets (https:// earth.google.com/ accessed on 14 September 2023).«Inkscape» (https:// inksc ape.org/) free graphics editor was used to edit the images.

Figure 3 .
Figure 3. Apocalathium population Structure analysis based on 345 SNPs.Each color represents a putative population.K denotes the number of putative populations.(A) Model allowing no admixture and independent alleles for putative number of populations varied between K = 2-4.(B) Model allowing admixture and independent alleles showing putative number of populations varied between K = 2-4.(C) Evanno plot showing that the delta K value for (A) is highest for K = 3. (D) Evanno plots showing that the deltaK value for (B) is highest for K = 4. hig = Highway Lake, ver = Lake Vereteno, mcn = Lake McNeil, gof = Gulf of Finland, tvä = Tvärminne, cop = Sankt Jorgens Sjø in Copenhagen, erk = Lake Erken, bai = Lake Baikal.

Table 2 .
Population PopulationTotal RAD sites Variant sites Private loci Nucleotide diversity (π) Nei total heterozygosity differentiation versus distance.Pairwise genetic population differentiation (Phi ST ) of populations from all eight sites above the diagonal.Pairwise geographic distance (km) below the diagonal.All differences except those with n.s. as superscript are significant.

Table 3 .
Unique RAD loci in freshwater versus saltwater strains.Top table-top 20 gene ontology terms of sequences with matches between RADSeq loci and RNA transcripts that are unique for saline populations.For stringency only GO terms with at least 10 loci were considered.Lower table-Top gene ontology terms of sequences with matches between RADSeq loci and RNA transcripts that are unique for freshwater populations.For stringency only GO terms with at least 10 loci were considered.